Approximating L1-distances Between Mixture Distributions Using Random Projections

نویسندگان

  • Satyaki Mahalanabis
  • Daniel Stefankovic
چکیده

We consider the problem of computing L1-distances between every pair of probability densities from a given family. We point out that the technique of Cauchy random projections [Ind06] in this context turns into stochastic integrals with respect to Cauchy motion. For piecewise-linear densities these integrals can be sampled from if one can sample from the stochastic integral of the function x 7→ (1, x). We give an explicit density function for this stochastic integral and present an efficient (exact) sampling algorithm. As a consequence we obtain an efficient algorithm to approximate the L1-distances with a small relative error. For piecewise-polynomial densities we show how to approximately sample from the distributions resulting from the stochastic integrals. This also results in an efficient algorithm to approximate the L1-distances, although our inability to get exact samples worsens the dependence on the parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data

Abstract We1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating p...

متن کامل

Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections

The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...

متن کامل

Nonlinear Estimators and Tail Bounds for Dimension Reduction in l 1 Using Cauchy Random Projections

For 1 dimension reduction in l1, the method of Cauchy random projections multiplies the original data matrix A ∈ R with a random matrix R ∈ R (k ≪ min(n,D)) whose entries are i.i.d. samples of the standard Cauchy C(0, 1). Because of the impossibility results, one can not hope to recover the pairwise l1 distances in A from B = AR ∈ R, using linear estimators without incurring large errors. Howev...

متن کامل

One sketch for all: Theory and Application of Conditional Random Sampling

Abstract Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise (l2, l1) distances, in static, large-scale, and sparse data. This study modifies the original CRS and extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with many other sketching algorithms for dimension reduct...

متن کامل

On Approximating the Lp Distances for p>2

Many applications in machine learning and data mining require computing pairwise lp distances in a data matrix A ∈ R . For massive high-dimensional data, computing all pairwise distances of A can be infeasible. In fact, even storing A or all pairwise distances of A in the memory may be also infeasible. For 0 < p ≤ 2, efficient small space algorithms exist, for example, based on the method of st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009